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Abstract 

Consider the minimum mean-square error (MMSE) of estimating an arbitrary random variable from 
its observation contaminated by Gaussian noise. The MMSE can be regarded as a function of the signal- 
to-noise ratio (SNR) as well as a functional of the input distribution (of the random variable to be 
estimated). It is shown that the MMSE is concave in the input distribution at any given SNR. For a 
given input distribution, the MMSE is found to be infinitely differentiable at aU positive SNR, and in 
fact a real analytic function in SNR under mild conditions. The key to these regularity results is that the 
posterior distribution conditioned on the observation through Gaussian channels always decays at least 
as quickly as some Gaussian density. Furthermore, simple expressions for the first three derivatives of 
the MMSE with respect to the SNR are obtained. It is also shown that, as functions of the SNR, the 
curves for the MMSE of a Gaussian input and that of a non-Gaussian input cross at most once over all 
SNRs. These properties lead to simple proofs of the facts that Gaussian inputs achieve both the secrecy 
capacity of scalar Gaussian wiretap channels and the capacity of scalar Gaussian broadcast channels, as 
well as a simple proof of the entropy power inequality in the special case where one of the variables is 
Gaussian. 

Index Terms: Entropy, estimation, Gaussian noise, Gaussian broadcast channel, Gaussian wiretap chan- 
nel, minimum mean-square error (MMSE), mutual information. 



I. Introduction 

The concept of mean-square error has assumed a central role in the theory and practice of estimation 

since the time of Gauss and Legendre. In particular, minimization of mean-square error underlies numer- 
ous methods in statistical sciences. The focus of this paper is the minimum mean-square error (MMSE) 
of estimating an arbitrary random variable contaminated by additive Gaussian noise. 

Let {X,Y) be random variables with arbitrary joint distribution. Throughout the paper, E{-} denotes 
the expectation with respect to the joint distribution of all random variables in the braces, and E { X| Y} 
denotes the conditional mean estimate of X given Y. The corresponding conditional variance is a function 
of Y which is denote by 

y,ar{X\Y} = E{{X-£{X\Y}f\Y}. (1) 

It is well known that the conditional mean estimate is optimal in the mean-square sense. In fact, the 
MMSE of estimating X given Y is nothing but the average conditional variance: 

mmse(X|y) = E {var {X\Y}} . (2) 
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Fig. 1. The MMSE of Gaussian and binary inputs as a function of the SNR. 



In this paper, we are mainly interested in random variables related through models of the following 
form: 

Y = ^/smX + N (3) 

where N ~ M{0, 1) is standard Gaussian throughout this paper unless otherwise stated. The MMSE of 
estimating the input X of the model given the noisy output Y is alternatively denoted by: 

mmse(X, snr) = mmse (X|-v/inr X + A^) (4) 

= E|(X-E{X|VinrX + iV})^} . (5) 

The MMSE Q can be regarded as a function of the signal-to-noise ratio (SNR) for every given 
distribution Px, and as a functional of the input distribution Px for every given SNR. In particular, for 
a Gaussian input with mean m and variance cr^, denoted by X ~ A/" (m, cr^), 

a'i- 

mmsefX, snr) = . (6) 

1 + fj^snr 

If X is equally likely to take ±1, then 

mmse(X, snr) = 1 — / tanh(snr — vsnr y) dy . (7) 

J-oo v27r 

The function mmse(X, snr) is illustrated in Fig. [T]for four special inputs: the standard Gaussian variable, 
a Gaussian variable with variance 1/4, as well as symmetric and asymmetric binary random variables, 
all of zero mean. 

Optimal estimation intrinsically underlies many fundamental information theoretic results, which de- 
scribe the boundary between what is achievable and what is not, given unlimited computational power. 
Simple quantitative connections between the MMSE and information measures were revealed in [1 1. One 
such result is that, for arbitrary but fixed Px, 

mmse(X,snr) = 2— I(X;\/inrX + iV). (8) 
dsnr 

This relationship implies the following integral expression for the mutual information: 

I{X; V^vg{X) + N) = l H mmse{g{X),^)d^ (9) 
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which holds for any one-to-one real-valued function g. By sending snr — )• oo in ([9]), we find the entropy 
of every discrete random variable X can be expressed as (see ||T|, Q): 

1 

H{X) = - mmse(5(X),7)d7 (10) 
^ Jo 

whereas the differential entropy of any continuous random variable X can be expressed as: 

h{X) = -ir^- mmse(5(X),7)d7. dD 

^ ^ Jo 1 + 7 

The preceding information-estimation relationships have found a number of applications, e.g., in non- 
linear filtering |[T|, |[3|, in multiuser detection in power allocation over parallel Gaussian channels Q, 
|[6|, in the proof of Shannon's entropy power inequality (EPI) and its generalizations |[2|, |[7|, ||8|, and 
in the treatment of the capacity region of several multiuser channels f9J-|l]J. Relationships between 
relative entropy and mean-square error are also found in |[T2|, | [13 j . Moreover, many such results have 
been generalized to vector-valued inputs and multiple-input multiple-output (MIMO) models |[T|, |[7|, 

(3- 

Partially motivated by the important role played by the MMSE in information theory, this paper 
presents a detailed study of the key mathematical properties of mmse(X, snr). The remainder of the 
paper is organized as follows. 

In Section |Ill we establish bounds on the MMSE as well as on the conditional and unconditional 
moments of the conditional mean estimation error. In particular, it is shown that the tail of the posterior 
distribution of the input given the observation vanishes at least as quickly as that of some Gaussian 
density. Simple properties of input shift and scaling are also shown. 



In Section III mmse(X, snr) is shown to be an infinitely differentiable function of snr on (0,oo) for 
every input distribution regardless of the existence of its moments (even the mean and variance of the 
input can be infinite). Furthermore, under certain conditions, the MMSE is found to be real analytic at 
all positive SNRs, and hence can be arbitrarily well-approximated by its Taylor series expansion. 



In Section IV the first three derivatives of the MMSE with respect to the SNR are expressed in terms 
of the average central moments of the input conditioned on the output. The result is then extended to 
the conditional MMSE. 

Section [V] shows that the MMSE is concave in the distribution Px at any given SNR. The monotonicity 
of the MMSE of a partial sum of independent identically distributed (i.i.d.) random variables is also 
investigated. It is well-known that the MMSE of a non-Gaussian input is dominated by the MMSE of a 
Gaussian input of the same variance. It is further shown in this paper that the MMSE curve of a non- 
Gaussian input and that of a Gaussian input cross each other at most once over snr G (0, oo), regardless 
of their variances. 



In Section VI properties of the MMSE are used to establish Shannon's EPI in the special case where 
one of the variables is Gaussian. Sidestepping the EPI, the properties of the MMSE lead to simple and 
natural proofs of the fact that Gaussian input is optimal for both the Gaussian wiretap channel and the 
scalar Gaussian broadcast channel. 

II. Basic Properties 

A. The MMSE 

The input X and the observation Y in the model described by y = ^/snr X+N are tied probabilistically 
by the conditional Gaussian probability density function: 

PY\x{y\x] sm) = LP {y - ^/smx) (12) 
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where stands for the standard Gaussian density: 

<f{t) = -^e-'-^ . (13) 

Let us define for every a G R and i = 0, 1, . . . , 

/ii(y;a) = E{XV(y-aX)} (14) 

which is always well defined because Lp{y — ax) is bounded and vanishes quadratic exponentially fast 
as either x ox y becomes large with the other variable bounded. In particular, /io(y; \/srir) is nothing but 
the marginal distribution of the observation Y , which is always strictly positive. The conditional mean 
estimate can be expressed as |[T|, ||4j|: 

E{xir=!,)=;j4?^i^ (15) 

/io(y; Vsnr) 

and the MMSE can be calculated as Q: 

mmse(X, snr) = 

// \^~in ^\] V^iy - Vsnrx)dydPx{x) 

which can be simplified if E {X^^ < oo: 

mmse(X, snr) = E{X'}- T Ml^^i^pll dy. (17) 

J-oo ^o(y; Vsnr) 

Note that the estimation error X — E{X| Y} remains the same if X is subject to a constant shift. 
Hence the following well-known fact: 

Proposition 1: For every random variable X and a G M, 

mmse(X + a, snr) = mmse(X, snr). (18) 

The following is also straightforward from the definition of MMSE. 
Proposition 2: For every random variable X and a G R, 

mmse(aX, snr) = mmse(X, snr). (19) 



B. The Conditional MMSE and SNR Increment 

For any pair of jointly distributed variables {X, U), the conditional MMSE of estimating X at SNR 
7 > given U is defined as: 

mmse(X,7|;7) = ^{{X - E{X\^ X + N,U]f ] (20) 

where N ~ A/'(0, 1) is independent of {X, U). It can be regarded as the MMSE achieved with side 
information U available to the estimator. For every u, let Xu denote a random variable indexed by u 
with distribution Px\u=u- Then the conditional MMSE can be seen as an average: 

mmse(X, snr|C/) = j mmse(Xu, snr)P[/(du). (21) 

A special type of conditional MMSE is obtained when the side information is itself a noisy observa- 
tion of X through an independent additive Gaussian noise channel. It has long been noticed that two 
independent looks through Gaussian channels is equivalent to a single look at the sum SNR, e.g., in 
the context of maximum-ratio combining. As far as the MMSE is concerned, the SNRs of the direct 
observation and the side information simply add up. 



4 



aiNi (T2N2 



X- 



+2L 



-snr + 7 



-A 



Fig. 2. An incremental Gaussian channel. 



Proposition 3: For every X and every snr, 7 > 0, 

mmse(X, 7|-v/snr X + A^) = mmse(X, snr + 7) (22) 

where N ~ M{0, 1) is independent of X. 

Proposition |3] enables translation of the MMSE at any given SNR to a conditional MMSE at a 
smaller SNR. This result was first shown in 1 1 1 using the incremental channel technique, and has been 
instrumental in the proof of information-estimation relationships such as ([8]). Proposition [3] is also the 
key to the regularity properties and the derivatives of the MMSE presented in subsequent sections. A 
brief proof of the result is included here for completeness. 

Proof of Proposition^ Consider a cascade of two Gaussian channels as depicted in Fig. [2j 

ysnr+7 = ^ + ^1^1 (23a) 
ysnr = ysnr+7 + ^2^2 (23b) 

where X is the input, Ni and are independent standard Gaussian random variables. A subscript is used 
to explicitly denote the SNR at which each observation is made. Let cji, (T2 > satisfy al = l/(snr + 7) 
and af + a2 = 1/snr so that the SNR of the first channel ([23a 1 is snr + 7 and that of the composite 



channel is snr. A linear combination of (23a I and (23b]) yields 



{sm + j)Ys^r+T = smYsnr + 7X + (24) 

where we have defined W = {'-f ai Ni — snra2N2)/^. Clearly, the input-output relationship defined 
by the incremental channel ( [23] ) is equivalently described by ( [24] ) paired with ( |23b| ). Due to mutual 
independence of {X, Ni, N2), it is easy to see that W is standard Gaussian and {X,W,aiNi + (72X2) 
are mutually independent. Thus W is independent of {X, Ysnr) by ( |23l ). Based on the above observations, 
the relationship of X and ysnr+7 conditioned on ynr = y is exactly the input-output relationship of 
a Gaussian channel with SNR equal to 7 described by (|24]) with Ysnr = V- Because Y^nr is a physical 
degradation of ysnr+7, providing ysnr as the side information does not change the overall MMSE, that 



is, mmse(X|ysnr+7) = mmse(X, 7|ysnr), which proves (22 1. 



C. Bounds 

The input to a Gaussian model with nonzero SNR can always be estimated with finite mean-square 
error based on the output, regardless of the input distribution. In fact, X = Y/ y^snr achieves mean-square 
error of 1/snr, even if E{X} does not exist. Moreover, the trivial zero estimate achieves mean-square 
error of EjX^}. 

Proposition 4: For every input X, 

mmse(X, snr) < — (25) 
snr 

and in case the input variance var {X} is finite, 

mmse(X, snr) < min < var {X} , — I . (26) 

snr 
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Proposition|4]can also be established using the fact that snr- mmse(X, snr) = mmse(A^| y^snr X + A^) < 
1, which is simply because the estimation error of the input is proportional to the estimation error of the 
noise ||7J: 

^/5m{X -E{X\Y}) = E{N\Y} - N . (27) 



Using pT] ) and known moments of the Gaussian density, higher moments of the estimation errors can 
also be bounded as shown in Appendix [Aj 

Proposition 5: For every random variable X and snr > 0, 



E{|X-E{X|VsnrX + Af}|"}<^^= \/n! (28) 



snr 



n 



for every n = 0, 1, . . . , where N ~ AA(0, 1) is independent of X. 

In order to show some useful characteristics of the posterior input distribution, it is instructive to 
introduce the notion of sub-Gaussianity . A random variable X is called sub-Gaussian if the tail of its 
distribution is dominated by that of some Gaussian random variable, i.e., 

P(|X| > A) < Ce~"^' (29) 

for some c, C > and all A > 0. Sub-Gaussianity can be equivalently characterized by that the growth 
of moments or moment generating functions does not exceed those of some Gaussian [15, Theorem 2]. 
Lemma 1: The following statements are equivalent: 

1) X is sub-Gaussian; 

2) There exists C > such that for every A; = 1, 2, ... , 

E^\X\''^ <C''Vk^. ; (30) 

3) There exist c,C > such that for all t > 0, 

E{e*^} < C7e^*\ (31) 

Regardless of the prior input distribution, the posterior distribution of the input given the noisy 
observation through a Gaussian channel is always sub-Gaussian, and the posterior moments can be upper 
bounded. This is formalized in the following result proved in Appendix [B| 

Proposition 6: Let Xy be distributed according to Px\Y=y where Y = aX + N, N AA(0, 1) is 
independent of X, and a / 0. Then Xy is sub-Gaussian for every y G M. Moreover, 

I — mI 

P{\Xy\>x}<J-— -e-— (32) 



and, for every n = 1, 2, , 



and 



E{|X,r}<-^(^) 7(r^ (33) 



V2 



ho{y;a) \ \a\ 

E{|X,-E{X,}r}<2"E{|X,n . (34) 
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III. Smoothness and Analyticity 

This section studies the regularity of the MMSE as a function of the SNR, where the input distribution 
is arbitrary but fixed. In particular, it is shown that mmse(X, snr) is a smooth function of snr on (0,oo) 
for every Px- This conclusion clears the way towards calculating its derivatives in Section |IV] Under 
certain technical conditions, the MMSE is also found to be real analytic in snr. This implies that the 
MMSE can be reconstructed from its local derivatives. As we shall see, the regularity of the MMSE at 
the point of zero SNR requires additional conditions. 



A. Smoothness 

Proposition 7: For every X, mmse{X, snr) is infinitely differentiable at every snr > 0. If E < 
oo, then mmse(X, snr) is k right-differentiable at snr = 0. Consequently, mmse(X, snr) is infinitely right 
differentiable at snr = if all moments of X are finite. 

Proof: The proof is divided into two parts. In the first part we first establish the smoothness assuming 
that all input moments are finite, i.e., E j-'^'^} < oo for all = 1, 2, . . . . 

For convenience, let Y = aX + N where a? = snr. For every i = 0, 1, . . . , denote 

(hi 

9i[y;a) = 

and 



^^^^'") = a^Uj^^'") ^^^^ 



miia) = / gi{y;a)dy (36) 

J — oo 

where hi is given by ([14]). By ( [T7| ), we have 

mmse(X,a2) = EjX^} -mo(a). (37) 
We denote by Hn the n-th Hermite polynomial [16, Section 5.5]: 

^.W-^^"^ (38) 

(p[x) dx" 

= n! y , ^ ,,, (2x)"-^^ . (39) 
k=o ^ ' 

Denote h[^\y;a) = d"'hi{y;a)/da^ throughout the paper. Then 

= £{X'+^Hn{N)\Y = y] (41) 



where the derivative and expectation can be exchanged to obtain ( [40| ) because the product of any 
polynomial and the Gaussian density is bounded. 
The following lemma is established in Appendix [C{ 

Lemma 2: For every i = 0, 1, . . . and all w > v, (y, a) 1— >• gi{y; a) is integrable on M x [v, w]. 
Using Lemma [2] and (36 1, we have 

pw pw poo 

/ mi+i{a)da= / gi+i{y; a)dyda (42) 

Jv Jv J —00 

9i{y;w) - gi{y;v)dy (43) 
mi{w) — mi{v) (44) 
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where ( [43| ) is due to ( [35] ) and Fubini's theorem. Therefore for every i > 0, nii is continuous. Hence for 
each a G R, 

drriiia) , , ,,,, 

-^^ = mi+i{a) (45) 

follows from the fundamental theorem of calculus p7[ p. 97]. In view of (37), we have 

^ = -m{a). (46) 

This proves that a i— )• mmse(X, a^) E (7°°(M), which implies that mmse(X, snr) is infinitely differentiable 
in snr on (0, oo). 

In the second part of this proof, we eliminate the requirement that all moments of the input exist by 
resorting to the incremental-SNR result. Proposition [3] Fix arbitrary 7 > and let = ^ X ^ N . For 
every u G M, let X^-^^ ^ Px\Y^=u- By ( [17] ), ( [2T] ) and Proposition [3j we have 

mmse(X,7 + a^) = ^ mmse(X„;^, a^)PK, (dn) (47) 
= E{x2}-mo(a) (48) 



where 



}n{y;a\u--i) = ^ {X^^y - aX)\Y^ = u] (49) 
da}- \/io 

■fni{a)= I I gi{y;a\u;-f)dyho{u;'y)du (51) 



gi{y;a\u;j) = —- { ] {y;a\u;j) (50) 



and 



for z = 0, 1, .... By Proposition [5j for each u, all moments of X^-f are finite. Each rhi is a well-defined 
real-valued function on M. Repeating the first part of this proof with hi{y;a) replaced by hi{y; a\u;^), 
we conclude that a 1— )• mmse(X, 7 + a^) G in a at least on \a\ > ^/7, which further implies 
that a I— mmse(X, a^) G C°°{M\[—^/2j, because a 1— a/c^ — 7 has bounded derivatives of all 

order when \a\ > \/2^. By the arbitrariness of 7, we have a 1— mmse(X, a^) G C°°(]R\{0}), hence 
mmse(X,-) G C°°((0, 00)). 

Finally, we address the case of zero SNR. It follows from (4]_) and the independence of X and Y at 
zero SNR that 

i|^fa;0) = E{A-«}ft.(rf. (M) 

Since E{|i/„(A^)|} < ^yE {H!^{N)} = \/n[ is always finite, induction reveals that the n-th derivative of 
mo at depends on the first n + 1 moments of X. By Taylor's theorem and the fact that mo(a) is an 
even function of a, we have 

moia) = ±^a^^+0{\ar^') (53) 

in the vicinity of a = 0, which implies that mo is i differentiable with respect to at 0, with 
d*mo(0+)/d(a2)* = m2i(0), as long as E {X'+^} < 00. ■ 
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B. Real Analyticity 

A function / : M — > R is said to be real analytic at xq if it can be represented by a convergence power 
series in some neighborhood of xq, i.e., there exists 6 > such that 



oo 



/(x) = J]a„(x-xor (54) 



n=0 



for every x G (xq — 6, xq + 6). One necessary and sufficient condition for / to be real analytic is that 
/ can be extended to some open disk D{xq, 6) = {z £ C : \z — xo\ < 6} in the complex plane by the 



power series ( |54| ) | ,18| . 

Proposition 8: As a function of a, mmse(X, a^) is real analytic at oq G M if either one of the following 
two sets of conditions holds: 

1) X is sub-Gaussian, and there exist c > and r > such that for every y G M, 

inf \hoiy;z)\>0 (55) 

z£D{ao,r) 

and 

limmt mt — — — > c (56) 

\y\^oo zeD{ao,r) ho[y;Re{z)) 

2) Co 7^ 0, and there exist c > 0, r > and 5 G (0, a^) such that for every y, n G M, 

inf \ho{y]z\u,5)\>Q (57) 

z&D{ao,r) 

and 

km mf mf — — — > c . (58) 

Ij/Koo 2eD{ao,r) /iQ (y ; Re(z) jti, o) 

Moreover, whenever mmse(X, a^) is real analytic at a G M, the function mmse(X, snr) is also analytic 
at snr = a?'. 

The last statement in Proposition [8] is because of the following. The Taylor series expansion of 
mms&{X,o?) at a = is an even function, so that the analyticity of mmse(X, a^) at a = implies 
the anlyticity of mmse(X, snr) at snr = 0. If mmse(X, a^) is analytic at a 7^ 0, then mmse(X, snr) is 
also analytic at snr = 0? because snr 1— )• y^snr is real analytic at snr > 0, and composition of analytic 
functions is analytic |19|. It remains to establish the analyticity of a 1— >■ mmse(X, a^), which is relegated 
to Appendix [Pj 

Conditions (55]l and ( |56l ) can be understood as follows. Recall that ho{y;a) denotes the density of 
Y = aX + N. The function /io(y; a) stays positive for all a G M, and decays no faster than the Gaussian 
density. However, ho{y; a) may vanish for some a G C, so that the MMSE may not be extendable to the 
convex plane. Hence the purpose of (55l and (56 1 is to ensure that the imaginary part of a has limited 
impact on |/io|- 

As an example, consider the case where X is equiprobable on {±1}. Then 

ho{y; a) = if{y) exp(-a^/2) cosh(ay) . (59) 

Letting a = jt yields ho(y;jt) = (p (^\/y'^ — t"^^ cos{ty), which has infinitely many zeros. In fact, in this 
case the MMSE is given by (|7]), or in an equivalent form: 

/oo 
Lp{y) tanh (a^ — ay) dy. (60) 
-00 

Then for any r > 0, there exists |ao| < r and yo G M, such that Oq — ooyo = jf ^^id the integral 
in (60 1 diverges near yQ. Therefore mmse(X, a^) cannot be extended to any point on the imaginary 
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axis, hence it is not real analytic at a = 0. Nevertheless, when Re(a) 7^ 0, condition (56 1 is satisfied. 
Hence mmse(X, a^) is real analytic on the real line except zero, which can be shown from (60 1 directly. 



Similarly, for any finite-alphabet, exponential or Gaussian distributed X, (51) and (58 1 can be verified 



for all a 7^ 0, hence the corresponding MMSE is real analytic at all positive SNR. 

IV. Derivatives 

A. Derivatives of the MMSE 

With the smoothness of the MMSE established in Proposition |7] its first few derivatives with respect to 
the SNR are explicitly calculated in this section. Consider first the Taylor series expansion of the MMSE 
around snr = 0^ to the third order {]] 

2 

mmse(X, snr) = 1 - snr + [2 - [EX^f] — 

z (61) 

- [15 - 12(EX3)2- 6EX4 + (EX^)^] — + 0{sm^) 

6 

where X is assumed to have zero mean and unit variance. The first three derivatives of the MMSE at 
snr = 0+ are thus evident from ( |6T] ). The technique for obtaining (6]_l is to expand ( [12] ) in terms of the 



small signal ^/smX, evaluate hi{y;^/snr) given by ([14]) at the vicinity of snr = using the moments 
of X (see equation (90) in f\\), and then calculate ( [T6] ), where the integral over y can be evaluated as a 
Gaussian integral. 

The preceding expansion of the MMSE at snr = 0+ can be lifted to arbitrary SNR using the SNR- 
incremental result. Proposition |3] Finiteness of the input moments is not required for snr > because 
the conditional moments are always finite due to Proposition [5] 

For notational convenience, we define the following random variables: 

M, = E{(X-E{X|y})*|y}, i = l,2,... (62) 

which, according to Proposition [5] are well-defined in case snr > 0, and reduces to the unconditional 
moments of X in case snr = 0. Evidently, Mi = 0, M2 = var |X|-^/inrX + and 

E{M2} = mmse(X,snr). (63) 

If the input distribution Px is symmetric, then the distribution of Mj is also symmetric for all odd i. 

The derivatives of the MMSE are found to be the expected value of polynomials of Mi, whose existence 
is guaranteed by Proposition [5] 

Proposition 9: For every random variable X and every snr > 0, 

d mmse(X, snr) 



-E{M|} (64) 



E{2M|-M|} (65) 



dsnr 

(^mmse{X, snr) 
dsnr^ 
and 

V = E |6M4M| - Ml + I2MIM2 - 15M^] . (66) 

dsnr'^ L ^ ^ 4 6 ^ z) 

'The previous result for the expansion of mmse(snr) around snr = 0"*", given by equation (91) in fl^ is mistaken in the 
coefficient corresponding to snr^. The expansion of the mutual information given by (92) in |1| should also be corrected 
accordingly. The second derivative of the MMSE is mistaken in (20| and corrected in Proposition |9] in this paper The function 
mmse(X, snr) is not always convex in snr as claimed in j20| , as illustrated using an example in Fig. [T] 
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The three derivatives are also valid at snr = 0+ if X has finite second, third and fourth moment, 
respectively. 

We relegate the proof of Proposition [9] to Appendix [E] It is easy to check that the derivatives found 
in Proposition |9] are consistent with the Taylor series expansion (6]_l at zero SNR. 



In light of the proof of Proposition [7] (and (|46|)), the Taylor series expansion of the MMSE can be 
carried out to arbitrary orders, so that all derivatives of the MMSE can be obtained as the expectation of 
some polynomials of the conditional moments, although the resulting expressions become increasingly 
complicated. 

Proposition |9] is easily verified in the special case of standard Gaussian input (X ~ ^f{0, 1)), where 
conditioned on Y = y, the input is Gaussian distributed: 

X^M(^y'). (67) 



1 + snr 1 + snr 

In this case M2 = (1 + snr)~^ M3 = and M4 = 3(1 + snr)"^ are constants, and ([64]), (|65]l and ([66]l 
are straightforward. 

B. Derivatives of the Mutual Information 

Based on Proposition [8] and [9] the following derivatives of the mutual information are extensions of 
the key information-estimation relationship ([8]). 

Corollary 1: For every distribution Px and snr > 0, 

-J{X;V^rX + N)=^-^^^E{Ml} (68) 



for i = 1,2, 



and 



dsnr 



dsnr^ 



I{X;VsnrX + N) = EiMi - -M^} (69) 



dsnr4 

1 



I{X;VsmX + N) 



(70) 



^-E {-M| + 6M4MI + 2M|M2 - 15M|} . 

as long as the corresponding expectation on the right hand side exists. In case one of the two set of 
conditions in Proposition [s] holds, ^/snr 1— )• I{^/snr X + N; X) is also real analytic. 

Corollary [T] is a generalization of previous results on the small SNR expansion of the mutual information 
such as in f2T|. Note that ( [68] ) with i = 1 is exactly the original relationship of the mutual information 
and the MMSE given by (|8) in Ught of ( [63] ). 

C. Derivatives of the Conditional MMSE 

The derivatives in Proposition ]9] can be generalized to the conditional MMSE defined in (20 1. The 
following is a straightforward extension of ( |64j ). 

Corollary 2: For every jointly distributed {X, U) and snr > 0, 

^mmse(X,snr|[/) = -E {M|([/)} (71) 
where for every u and i = 1, 2, . . . , 

M,(n) = e{[X„- E{X„|y}]'|y = V^X, + iv} (72) 
is a random variable dependent on u. 
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V. Properties of the MMSE Functional 

For any fixed snr, mmse(X, snr) can be regarded as a functional of the input distribution Px- Mean- 
while, the MMSE curve, {mmse(X, snr), snr G [0,oo)}, can be regarded as a "transform" of the input 
distribution. 



A. Concavity in Input Distribution 

Proposition 10: The functional mmse(X, snr) is concave in Px for every snr > 0, 

Proof: Let B he a Bernoulli variable with probability a to be 0. Consider any random variables Xq, 
Xi independent of B. Let Z = Xb, whose distribution is aPxo + (1 — a)Pxi- Consider the problem 
of estimating Z given ^/snr Z + N where is standard Gaussian. Note that if B is revealed, one can 
choose either the optimal estimator for Px^ or Px^ depending on the value of B, so that the average 
MMSE can be improved. Therefore, 

mmse(Z, snr) > mmse(Z, snr|i?) (73) 
= ammse(Xo, snr) + (1 — a)mmse(Xi, snr) (74) 

which proves the desired concavityF] ■ 



B. Conditioning Reduces the MMSE 

As a fundamental measure of uncertainty, the MMSE decreases with additional side information 
available to the estimator. This is because that an informed optimal estimator performs no worse than 
any uninformed estimator by simply discarding the side information. 

Proposition 11: For any jointly distributed (X, U) and snr > 0, 

mmse(X, snr|C/) < mmse(X, snr). (75) 

For fixed snr > 0, the equality holds if and only if X is independent of U. 

Proof: The inequality ( [75] ) is straightforward by the concavity established in Proposition [10] In case 
the equality holds, Px\u=u must be identical for P{/-almost every u due to strict concavity ||22j|, that is, 
X and U are independent. ■ 



C. Monotonicity 



Propositions 10 and 11 suggest that a mixture of random variables is harder to estimate than the 
individual variables in average. A related result in f2\ states that a linear combination of two random 
variables Xi and X2 is also harder to estimate than the individual variables in some average: 

Proposition 12 (^): For every snr > and a G [0,27r], 

mmsefcosaXi + sin 0X9, snr) 

9 , ^ 9 . ^ (76) 

> COS a mmse(Xi, snr) + sin a mmse(X2, snr) 



A generalization of Proposition 12 concerns the MMSE of estimating a normalized sum of independent 
random variables. Let Xi^X^^. ■ ■ be i.i.d. with finite variance and Sn = {Xi + • • • + X,^l^/n. It has 
been shown that the entropy of Sn increases monotonically to that of a Gaussian random variable of the 
same variance |j8|, ||23l]. The following monotonicity result of the MMSE of estimating Sn in Gaussian 
noise can be established. 

Proposition 13: Let Xi,X2,. ■ ■ be i.i.d. with finite variance. Let Sn = (-^1 + • • • + Xn)/\/n. Then 
for every snr > 0, 

mmse(5n+i, snr) > mmse(5n, snr). (77) 

^Strict concavity is shown in 1221. 
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Because of the central limit theorem, as n — )• oo the MMSE converges to the MMSE of estimating a 
Gaussian random variable with the same variance as that of X. 

Proposition [T3] is a simple corollary of the following general result in fS^. 

Proposition 14 ([^): Let Xi, . . . , Xn be independent. For any Ai, . . . , > which sum up to one 
and any 7 > 0, 



mmse 



where Xy = J2 -^j 



Setting Aj = 1/n in ( [78] ) yields Proposition 13 



In view of the representation of the entropy or differential entropy using the MMSE in Section |lj 



integrating both sides of ( [77] ) proves a monotonicity result of the entropy or differential entropy of Sn 
whichever is well-defined. More generally, fE) applies (]_l_i and Proposition 14 to prove a more general 
result, originally given in |23| |. 

D. Gaussian Inputs Are the Hardest to Estimate 

Any non-Gaussian input achieves strictly smaller MMSE than Gaussian input of the same variance. 
This well-known result is illustrated in Fig. [T] and stated as follows. 

Proposition 15: For every snr > and random variable X with variance no greater than cr^, 

mmsefX, snr) < ^. (79) 

1 + snra^ 

The equality of ( |79l ) is achieved if and only if the distribution of X is Gaussian with variance cr^. 

Proof: Due to Propositions [T] and |2j it is enough to prove the result assuming that E {X} = and 
yar {X} = (t\. Consider the linear estimator for the channel ([3]): 



X' = Y (80) 

snrcj^ + 1 

which achieves the least mean-square error among all linear estimators, which is exactly the right hand 
side of ( |79l ), regardless of the input distribution. The inequality ^T9\ is evident due to the suboptimality 
of the linearity restriction on the estimator. The strict inequality is established as follows: If the linear 
estimator is optimal, then E{y^(X — X')} = for every /c = 1, 2, . . . , due to the orthogonality principle. 
It is not difficult to check that all moments of X have to coincide with those of AA(0, a"^). By Carleman's 
Theorem [ ,24J , the distribution is uniquely determined by the moments to be Gaussian. ■ 



/snr 



Note that in case the variance of X is infinity, ([79]) reduces to ([25]). 
E. The Single -Crossing Property 



In view of Proposition 15 and the scaling property of the MMSE, at any given SNR, the MMSE 
of a non-Gaussian input is equal to the MMSE of some Gaussian input with reduced variance. The 
following result suggests that there is some additional simple ordering of the MMSEs due to Gaussian 
and non-Gaussian inputs. 

Proposition 16 (Single-crossing Property): For any given random variable X, the curve of mmse(X, 7) 
crosses the curve of (1 + 7)^^, which is the MMSE function of the standard Gaussian distribution, at 
most once on (0, 00). Precisely, define 

/(7) = (l + 7)-i-mmse(X,7) (81) 

on [0, cxd). Then 
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1) fil) is strictly increasing at every 7 with f{'y) < 0; 

2) If /(snro) = 0, then 7(7) > at every 7 > snro; 

3) /(t) = 0. 



Furthermore, all three statements hold if the term (1 + 7) in (8]_l is replaced by a /{I + o" 7) with 
any a, which is the MMSE function of a Gaussian variable with variance cj^. 




Fig. 3. An example of the difference between the MMSE for standard Gaussian input and that of a binary input equally likely 
to be ±\/2- The difference crosses the horizontal axis only once. 



Proof: The last of the three statements, lim.^^00 /(t) = always holds because of Proposition |4[ 
If var {X} < 1, then 7(7) > at all 7 due to Proposition [Tsj so that the proposition holds. We suppose 
in the following var {X} > 1. An instance of the function 7(7) with X equally likely to be ib\/2 is 
shown in Fig. [5] Evidently /(O) = 1 — var{X} < 0. Consider the derivative of the difference ( |8T] ) at 
any 7 with 7(7) < 0, which by Proposition [9j can be written as 

/'(7) = E{M|}-(l + 7)-' (82) 

> E {M|} - (mmse(X, 7))^ (83) 
= E{M|} - (EMa)^ (84) 

> (85) 



where (84i is due to ( (63| ), and ( |85] ) is due to Jensen's inequality. That is, f'{'y) > as long as f{j) < 0, 
i.e., the function / can only be strictly increasing at every point it is strictly negative. This further implies 
that if /(snro) = for some snro, the function /, which is smooth, cannot dip to below zero for any 
7 > snro. Therefore, the function / has no more than one zero crossing. 

For any a, the above arguments can be repeated with 0-^7 treated as the SNR. It is straightforward to 
show that the proposition holds with the standard Gaussian MMSE replaced by the MMSE of a Gaussian 
variable with variance cr^. ■ 
The single-crossing property can be generalized to the conditional MMSE defined in (|20l)p| 
Proposition 17: Let X and U be jointly distributed variables. All statements in Proposition [T6| hold 
literally if the function /(•) is replaced by 

/(7) = (l + 7)-i-mmse(X,7|C/) . (86) 

Proof: For every u, let Xu denote a random variable indexed by u with distribution Px\u=u- Define 
also a random variable for every u, 

M{u,j) = M2{Xu,7) (87) 
= var {XulVsnr Xu + N} (88) 

^The single-crossing property has also been extended to the parallel degraded MIMO scenario |25|. 
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where ~ A/'(0, 1). Evidently, E{M(u, 7)} = mmse(X„,7) and hence 

/(7) = ^ - E{E{Af(t/,7)| U}} (89) 
1 + 7 

= -^-E{M(C/,7)}. (90) 
1 + 7 

Clearly, 

/'W = -(n^-E{|^M(K7)} (91) 

= E{Af=()7,7)}-(j^ (92) 

by Proposition [9] In view of ( [90l ), for all 7 such that 7(7) < 0, we have 

/'(7) > E{Af2(C/,7)} - (E{M(;7,7)})' (93) 
> (94) 

by ([92]) and Jensen's inequality. The remaining argument is essentially the same as in the proof of 



Proposition 16 



F. The High-SNR Asymptotics 

The asymptotics of mmse(X, 7) as 7 — 00 can be further characterized as follows. It is upper bounded 
by 1/7 due to Propositions |4] and 15 Moreover, the MMSE can vanish faster than exponentially in 7 



with arbitrary rate, under for instance a sufficiently skewed binary input |26]|^On the other hand, the 
decay of the MMSE of a non-Gaussia n random variable need not be faster than the MMSE of a Gaussian 

variable. For example, \e.t X = Z + \ (y\ — IB where ax > 1, ^ ~ A/'(0, 1) and the Bernoulli variable 



B are independent. Clearly, X is harder to estimate than Z but no harder than axZ, i.e.. 



< mm5e(X,7) < '^^ (95) 
1 + 7 1 + 0-^,7 

where the difference between the upper and lower bounds is O (7^^)- As a consequence, the function 
/ defined in ( |8T] ) may not have any zero even if /(O) = 1 — a\ < and lim^_>oo /(t) = 0. A 
meticulous study of the high-SNR asymptotics of the MMSE is found in ||22|, where the limit of the 
product snr • mmse(X, snr), called the MMSE dimension, has been determined for input distributions 
without singular components. 

VI. Applications to Channel Capacity 

A. Secrecy Capacity of the Gaussian Wiretap Channel 

This section makes use of the MMSE as an instrument to show that the secrecy capacity of the 
Gaussian wiretap channel is achieved by Gaussian inputs. The wiretap channel was introduced by Wyner 
in |[27] in the context of discrete memoryless channels. Let X denote the input, and let Y and Z denote 
the output of the main channel and the wiretapper's channel respectively. The problem is to find the 
rate at which reliable communication is possible through the main channel, while keeping the mutual 
information between the message and the wiretapper's observation as small as possible. Assuming that 

■*ln case the input is equally likely to be ±1, the MMSE decays as e^^^"''^ not er^^"' as stated in jTJ, j26| . 
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the wiretapper sees a degraded output of the main channel, Wyner showed that secure communication 
can achieve any rate up to the secrecy capacity 

Cs = max[/(X; Y) - I{X; Z)] (96) 

X 

where the supremum is taken over all admissible choices of the input distribution. Wyner also derived 
the achievable rate-equivocation region. 

We consider the following Gaussian wiretap channel studied in 



Y = ^/smlX + Ni (97a) 
Z = ^/^X + N2 (97b) 

where snri > snr2 and Ni,N2 ~ A/'(0, 1) are independent. Let the energy of every codeword of length 
n be constrained by - Yll=i ^1 — 1- Reference [28] showed that the optimal input which achieves the 
supremum in ([96]) is standard Gaussian and that the secrecy capacity is 

C.-boJl±^). (98) 



2 °Vl + snr2 



In contrast to |28| which appeals to Shannon's EPI, we proceed to give a simple proof of the same 
result using Q, which enables us to write for any X: 

I{X; Y) - I{X; Z) = \ mmse(X, 7)d7 . (99) 



Under the constraint E {X"^^ < 1, the maximum of ( |99l ) over X is achieved by standard Gaussian input 



because it maximizes the MMSE for every SNR under the power constraint. Plugging mmse(X, 7) 
(1+7)^^ into ( [99l ) yields the secrecy capacity given in (98 1. In fact the whole rate-equivocation region can 



be obtained using the same techniques. Note that the MIMO wiretap channel can be treated similarly yjj. 



B. The Gaussian Broadcast Channel 

In this section, we use the single-crossing property to show that Gaussian input achieves the capacity 
region of scalar Gaussian broadcast channels. Consider a degraded Gaussian broadcast channel also 



described by the same model ( |97| ). Note that the formulation of the Gaussian broadcast channel is 
statistically identical to that of the Gaussian wiretap channel, except for a different goal: The rates 
between the sender and both receivers are to be maximized, rather than minimizing the rate between the 
sender and the (degraded) wiretapper. The capacity region of degraded broadcast channels under a unit 
input power constraint is given by [ |29J : 

'Ri < I{X-Y\U)' 
R2<I{U;Z) 



u 



Pux.E{X^}<l 



(100) 



where U is an auxiliary random variable with U-X-{Y, Z) being a Markov chain. It has long been 
recognized that Gaussian Pux with standard Gaussian marginals and correlation coefficient E{UX} = 
y/1 — a achieves the capacity. The resulting capacity region of the Gaussian broadcast channel is 

i?i < - log (1 + a snri] 



U 

ae[o,i] 



R2<1 log 



1 + snr2 
1 + asnr2 



(101) 



The conventional proof of the optimality of Gaussian inputs relies on the EPI in conjunction with 
Fano's inequality fSO]. The converse can also be proved directly from ( 100 ) using only the EPI |31|, 
p2J . In the following we show a simple alternative proof using the single-crossing property of MMSE. 



16 




isnro 2 snr2 3 snri 4 

Fig. 4. The thin curves show the MMSE (sohd hne) and mutual information (dashed hne) of a Gaussian input. The thick 
curves show the MMSE (solid) and mutual information (dashed) of binary input. The two mutual informations are identical at 
snr2, which must be greater than snro where the two MMSE curves cross. 



Due to the power constraint on X, there must exist a G [0, 1] (dependent on the distribution of X) 
such that 

1 



I{X-Z\U) = -log(l + asnr2) 



snra 



a 



07 + 1 



d7. 



By the chain rule, 

/([/; Z) = I{U, X; Z) - I{X; Z\ U) 
= I{X;Z)-I{X;Z\U). 

By ([Too I and (102 1, the desired bound on R2 is established: 

-R2 < ^ log (1 + snr2) - ^ log (1 + a snr2) 



1 



log 



1 + snr2 



(102) 
(103) 



(104) 
(105) 



(106) 
(107) 



1 + asnr2 

It remains to establish the desired bound for Ri . The idea is illustrated in Fig. [4j where crossing of 
the MMSE curves imply some ordering of the corresponding mutual informations. Note that 

f snr2 



and hence 



I{X;Z\U = u) 



I{X-Z\U) 



1 



mmse(X„,7)d7 



snr2 



E {mmse(Xt/, 7|f/)} d7. 



Comparing ( |109[ ) with ( |103[ ), there must exist < snro < snr2 such that 



E {mmse(X^, snro|f7)} 



asnro + 1 



By Proposition 17 this implies that for all 7 > snr2 > snro, 

E{mmse(Xc;,7|[/)} < 



a 

a7 + 1 



(108) 
(109) 

(110) 

(111) 
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Consequently, 



Ri<I{X;Y\U) 

/"snri 



E {mmse(X^, -flU)} d'y 



+ 



snra 



E {mmse{Xu, 'y\U)} d'y 



1 1 r"'! 

< -log(l + asnr2) + - / 



a 



07 + 1 



d7 



log (1 + asnri) 



(112) 
(113) 

(114) 

(115) 

(116) 



where the inequaUty ( |115| ) is due to ( |102| ), ( |109| ) and ( |111| ). 



C. Proof of a Special Case of EPI 
As another simple application of the single-crossing property, we show in the following that 



^2h{X+Z) 



> e 



2h(X) 



(117) 



for any independent X and Z as long as the differential entropy of X is well-defined and Z is Gaussian 
with variance a\. This is in fact a special case of Shannon's entropy power inequality. Let W ~ A/'(0, 1) 
and a} be the ratio of the entropy powers of X and W , so that 

1 



h{X) = h{aW) = - log (2^ea' 



Consider the difference 



/smX + N 

1 n 



h 



snraW + N) 
mmse(X, 7) — mmse(aVF, 7)d7 



(118) 



(119) 



where N is standard Gaussian independent of X and W. In the limit of snr — )• c«, the left hand side 
of ( |1 19 1 vanishes due to ( 1 18| ). By Proposition [16} the integrand in ( 119 l as a function of 7 crosses zero 
only once, which implies that the integrand is initially positive, and then becomes negative after the zero 
crossing (cf. Fig. |3]l. Consequently, the integral ( |1 19| ) is positive and increasing for small snr, and starts 
to monotonically decrease after the zero crossing. If the integral crosses zero it will not be able to cross 



zero again. Hence the integral in ( |1 19| ) must remain positive for all snr (otherwise it has to be strictly 
negative as snr — )• 00). Therefore, 

exp (2/1 {y/snrX + N)) > exp {h (^inrT^ + A^)) 

= 27re (a^snr + l) 
= exp (2/1 {^/smX)) 

which is equivalent to ( |117[ ) by choosing snr = cr^^ and appropriate scaling. 

The preceding proof technique also applies to conditional EPI, which concerns h{X\U) and h{X + 
Z\U), where Z is Gaussian independent of U. The conditional EPI can be used to establish the capacity 



27re 



(120) 
(121) 
(122) 



region of the scalar broadcast channel in pOj, |31 1. 
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VII. Concluding Remarks 

This paper has established a number of basic properties of the MMSE in Gaussian noise as a transform 
of the input distribution and function of the SNR. Because of the intimate relationship MMSE has with 
information measures, its properties find direct use in a number of problems in information theory. 

The MMSE can be viewed as a transform from the input distribution to a function of the SNR: 
Px I—)- {mmse(Pxi7)i 7 £ [0,oo)}. An interesting question remains to be answered: Is this transform 
one-to-one? We have the following conjecture: 

Conjecture 1: For any zero-mean random variables X and Z, mmse(X, snr) = mmse(Z, snr) for all 
snr e [0, oo) if and only if X is identically distributed as either Z or —Z. 

There is an intimate relationship between the real analyticity of MMSE and Conjecture [T] In particular, 
MMSE being real-analytic at zero SNR for all input and MMSE being an injective transform on the set of 
all random variables (with shift and reflection identified) cannot both hold. This is because given the real 
analyticity at zero SNR, MMSE can be extended to an open disk D centered at zero via the power series 
expansion, where the coefficients depend only on the moments of X. Since solution to the Hamburger 
moment problem is not unique in general, there may exist different X and X' with the same moments, and 
hence their MMSE function coincide in D. By the identity theorem of analytic functions, they coincide 
everywhere, hence on the real line. Nonetheless, if one is restricted to the class of sub-Gaussian random 



variables, the moments determine the distribution uniquely by Carleman's condition |24|. 



Appendix A 
Proof of Proposition [5] 



Proof: Let Y = ysnrX + N with snr > 0. Using p7] ) and then Jensen's inequality twice, we have 

E{|X- E{X|y}|"} 

= snrt2"E{2-"iE{iV|y}-iVp} (123) 

< snrt2"-iE{|E{Af|y}P + |iV|"} (124) 

<snr?2"E{|A^|"} (125) 



which leads to ( 28 1 because 



E{|iVn = A/^rf^) (126) 



TT \ 2 

< VnJ. . (127) 



Appendix B 
Proof of Proposition [6] 

Proof: We use the characterization by moment generating function in Lemma [T] 

- ^^"'f .^UJ^t + ay)X-'^]\ (129) 



ho{y;a) 



<T^e.pf(y^| (130) 



ho{y;a) \ 2a^ 

fiy) , 2 

r exp — -^ ^■ 

ho{y;a) \a 



< T^7^exp( ) (131) 
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where ( |130[ ) and ( |131[ ) are due to elementary inequalities. Using Chernoff 's bound and ( |131| ), we have 

P{Xy >x}< E|e*(^«-^)} (132) 



ho{y;a) V 



for all X, t > 0. Choosing t 



yields 



P {Xy >X}< 



e 2 



Similarly, P {Xy < —x} admits the same bound as above, and ( |32] ) follows from the union bound. Then, 
using an alternative formula for moments |j33j p. 319]: 



ax 



(133) 



(134) 



E{\Xy\"} = n x^^-^P {\Xy\ > x} dx 
Jo 



ho{y;a) Jq \^/2, 



< 



ne 2 



V2 

ho{y;a) \ \a\ 



E{|iV|-i} 



(135) 
(136) 
(137) 



where N ~ AA(0, 1) and (136 1 is due to ([32]). The inequality (33 1 is thus established by also noting (127 1 



Conditioned on Y = y, using similar techniques leading to ( |125[ ), we have 

E{\X-E{X\Y}r\Y = y} 

< 2"-i(E{|Xr|y = y} + |E{X|y = y}r) 

< 2"E{|X|"|y = y} 



(138) 
(139) 



which is (34 1. 



Appendix C 
Proof of Lemma[2] 

We first make the following observation: 

Lemma 3: For every i = 0, 1, . . . , the function gi is a finite weighted sum of functions of the following 
form: 



h: 



k- 

,-=1 



(140) 



where nj,mj, k = 0,1, ... . 

Proof: We proceed by induction on i: The lemma holds for i = by definition of go. Assume the 
induction hypothesis holds for i. Then 



(141) 



^0 1=1 



which proves the lemma. 
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To show the absolutely integrability of gi, it suffices to show the function in ( 140 1 is integrable: 

k 



1 j^ d^'^hn.iyia) 



dy 



k 



1 d^^^hr 



HY-a) 



ho{Y;a) da^^ 
= E I n I E { H^^ {¥ - aX) \ Y] 

k 

< n [e { (E { {Y -aX)\\ Y]f] 

k 

< n [E{|X|'=('^^+"^^)}E{|i?„^.(7V)|'=} 

< oo 



(142) 

(143) 

(144) 

(145) 
(146) 



where ( |143| ) is by ( |4T] ), ( |144| ) is by the generaUzed Holder inequality p?] p. 46], and ( |145| ) is due to 
Jensen's inequality and the independence of X and = y — aX. 

Appendix D 
Proof of Proposition [8] on the Analyticity 

We first assume that X is sub-Gaussian. 

Note that is real analytic everywhere with infinite radius of convergence, because ip^'^\y) = 
{—I)"' Hn{y)ip{y) and Hermite polynomials admits the following bound |35 p. 997]: 

\Hn{y)\ < KVnle' 

where k is an absolute constant. Hence 



(147) 



lim 

n— >oo 







and the radius of convergence is infinite at all y. Then 



(p{y - a'x) = 



Hn{y - ax)ip{y - ax)x'^ , 



[a — a) 



n=0 



(148) 



(149) 



holds for all a, x G M. By Lemma[l] there exists c > 0, such that E {|X|"} < c"\/nl for all n = 1, 2, 

By ( |147 1, it is easy to see that \ 'Hn{y)^{y)\ < K,\fn\ for every y. Hence 

(150) 



E {|/7„(y - aX)^{y - aX)X^W < nc^nl . 



\a — a\ 



Thus for every \a' — a\ < R = ^, 

oo I 

E 

n=0 

Applying Fubini's theorem to ( |149[ ) yields 



-E{\{Hn-ip){y-aX)X^\} <oo. 



hoiy;a') = J] E{(g„ • - aX)X^} 



n=0 



(151) 



(152) 
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Therefore, ho{y; a) is real analytic at a and the radius of convergence is lower bounded by R independent 
of y. Similar conclusions also apply to hi{y;a) and 



n=0 



(153) 



holds for all y G M and all \a' — a\ < R. Extend ho{y;a) and hi{y;a) to the complex disk D{a,R) 
by the power series ( |152[ ) and ( |153 1. By ( |55l ), there exists < r < i?/2, such that ho{y\z) does not 
vanishes on the disk D{a,r). By |19 Proposition 1.1.5], for all y G M, 



9o{y;z) 



ho{y;z) 



is analytic in 2: on D{a,r). 

By assumption (56 1, there exist B,c > 0, such that 

|/io(y;^;)l > c/io(y;Re(z)) 
for all 2; G D(o, r) and all \y\ > B. Define 



B 



go{y-z)dy. 



(154) 



(155) 



(156) 



Since (y, z) 1— )• 5'o(y; z) is continuous, for every closed curve 7 in -D(a, r), we have Jj^^ \9Q{y\ z)\dydz < 
00. By Fubini's theorem. 



B 



go{y; a)dydz 



^ J-B 



B 



B J J 



go{y;a)dzdy = 



(157) 



where the last equality follows from the analyticity of go{y; •)■ By Morera's theorem |36 Theorem 3.1.4], 
rriQ is analytic on D{a,r). 

Next we show that as S — )• cx), uiq tends to mo uniformly in z G D{a,r). Since uniform limit of 
analytic functions is analytic pT) p. 156], we obtain the analyticity of mo. To this end, it is sufficient to 
show that {|yo(' ; : z G D{a,r)} is uniformly integrable. Let z = s + it. Then 



\hi{y;z)\ = \E{Xip{y-zX)}\ 
< E{\XMy-zX)\} 



(158) 
(159) 

(160) 



Therefore, for all z G D{a,r), 

\go{y;z)\^dy 



K 



\go{y;z)\'^dy 



< 



< 



1 



hiiy;z) 



ho{y;s) 

{I 



hl{y;s)dy 



ifiy-sX)] 



ho{y;s) 



{(E{|X|e^|y.})^} 



ho{y;s)dy 



(161) 

(162) 

(163) 
(164) 
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where ( fT6T] ) is by ([56]), (T62\ is by |/io(y;s)| < 1, ( [1631 ) is by 
inequality and \t\ < r. Since X is sub-Gaussian satisfying (|29]) and r < i?/2 = l/(2c 

(2r2)" 



, and ( |164 ) is due to Jensen's 



< 



n=Q 

oo 

E 

n=0 



n! 



(2r2)" 



n! 



-E{|X|2"+4} 
V(2n + 4)! c2"+4 



< 4c''^(n2 + 3n + 2)(2rc) 



2n 



n=0 



(165) 

(166) 

(167) 
(168) 

Therefore {\go{- z)\ : z G D{a, r)} is L^-bounded, hence uniformly integrable. We have thus shown that 
mo (a), i.e., the MMSE, is real analytic in a on M. 

We next consider positive SNR and drop the assumption of sub-Gaussianity of X. Let oq > and 
fix 6 with < \/5 < ao/2. We use the incremental-SNR representation for MMSE in (48 1. Define Xu 
to be distributed according to X — E{X\Ys = u} conditioned on Ys = u and recall the definition of 
and hi{y; a\u; 6) in ( |49l ). In view of Proposition |6] Xu is sub-Gaussian whose growth of moments only 
depends on 6 (the bounds depend on u but the terms varying with n do not depend on u). Repeating 
the arguments from (147]) to ( |153| ) with c = \/2/S, we conclude that /io(y; a\u; 6) and hi{y; a\u; 6) are 
analytic in a and the radius of convergence is lower bounded by R = independent of u and y. 

Let r < \/5/4. The remaining argument follows as in the first part of this proof, except that ( |161[ )-( [T68] ) 
are replaced by the following estimates: Let r = t^/2, then 

4" 



(169) 
(170) 

(171) 

(172) 

(173) 
(174) 




< oo 



where (169]) is by Jensen's inequality, (ITOl is by Fubini's theorem, (174) is because r < r^/2 < 5^/32, 
and ( 171 1 is by Lemma [4] to be established next. 

Let Mi be defined as in Section IV-A The following lemma bounds the expectation of products of 

\Mi\: 

Lemma 4: For any snr > 0, k,ij,nj G N, 

k 



E I 



< snr 



(175) 
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where n = X]j=i ^i^j- 

Proof: In view of Proposition |5j it suffices to establish: 



k rij ij_ 

^nn(E{w/})" 

j=i /=i 

k rij 

<llll{E{\X-E{X\Y}n)'-i 

3=1 1=1 

= E{|X-E{X| 



(176) 

(177) 

(178) 
(179) 



where ( 177 1 and (178 1 are due to the generalized Holder's inequality and Jensen's inequality, respectively. 



Appendix E 
Proof of Proposition |9] on the Derivatives 

The first derivative of the mutual information with respect to the SNR is derived in |T| using the 
incremental channel technique. The same technique is adequate for the analysis of the derivatives of 
various other information theoretic and estimation theoretic quantities. 

The MMSE of estimating an input with zero mean, unit variance and finite higher-order moments 
admits the Taylor series expansion at the vicinity of zero SNR given by ( [6T] ). In general, given a random 
variable X with arbitrary mean and variance, we denote its central moments by 

mi = E{{X -E{X}y}, i = l,2,.... (180) 

Suppose all moments of X are finite, the random variable can be represented as X = E {X} + v^m2 Z 
where Z has zero mean and unit variance. Clearly, EZ* = 7712 ^ rrii 

mmse(X, snr) 

= 772,2 mmse(Z, snr 7712) 



By ( [6T] ) and Proposition |2 



m2 



77iSsnr 



+ (2rn 



snr 



(in^ — 677147772 



snr-" 



12m§77i2 + 15777^) — + O (snr^) . 



In general, taking into account the input variance, we have: 

mmse'(X,0) 



-7772 



mmse"(X,0) 
mmse'"(X,0) 



2mi 



-777,4 + 67774ml -|- 12m\m2 



157772 . 



(181) 

(182) 

(183) 
(184) 
(185) 



Now that the MMSE at an arbitrary SNR is rewritten as the expectation of MMSEs at zero SNR, we 
can make use of known derivatives at zero SNR to obtain derivatives at any SNR. Let Xy^^^ ~ Px\Y^„=y 
Because of ( |183| l, 

dmmse(Xy;snr,7) 



d7 



7=0^ 



(var{X|ysnr = y})^ 



(186) 
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Thus, 



dmmse(X, snr) _ d 
dsnr d7 

_ d_ 
d'y 



mmse(X, snr + 7) 



mmse(X,7|ysnr; 



7=0+ 
7=0+ 



(187) 
(188) 



(var{X|y,nr})'} 



(189) 

= -E{M|} (190) 

where ( 188 1 is due to Proposition [3] and the fact that the distribution of Ys^r is not dependent on 7, and 
( 189| ) is due to ( |186| ) and averaging over y according to the distribution of Ys^r = \'^snr X + N. Hence 
(64 1 is proved. Moreover, because of ( |184 i, 

d^mmse(Xj^;snr,7) 

^^^^^ 



d72 



2 (var{X|y,nr = y})' 
[E{{X -E{X\Ys,,,}f\Ys,, = y}f 



which leads to ([65]) after averaging over the distribution of Ysnr- Similar arguments, together with ( |185| ), 
lead to the third derivative of the MMSE which is obtained as (|66ll. 
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